A Finite-State Morphological Analyser for Sindhi

نویسندگان

  • Raveesh Motlani
  • Francis M. Tyers
  • Dipti Misra Sharma
چکیده

Morphological analysis is a fundamental task in natural-language processing, which is used in other NLP applications such as part-of-speech tagging, syntactic parsing, information retrieval, machine translation, etc. In this paper, we present our work on the development of free/open-source finite-state morphological analyser for Sindhi. We have used Apertium’s lttoolbox as our finite-state toolkit to implement the transducer. The system is developed using a paradigm-based approach, wherein a paradigm defines all the word forms and their morphological features for a given stem (lemma). We have evaluated our system on the Sindhi Wikipedia, which is a freely-available large corpus of Sindhi and achieved a reasonable coverage of about 81% and a precision of over 97%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite State Morphology and Sindhi Noun Inflections

Sindhi is a morphologically rich language. Morphological construction include inflections and derivations. Sindhi morphology becomes more complex due to primary and secondary word types which are further divided into simple, complex and compound words. Sindhi nouns are marked by number gender and case. Finite state transducers (FSTs) quite reasonably represent the inflectional morphology of Sin...

متن کامل

Developing language technology tools and resources for a resource-poor language: Sindhi

Sindhi, an Indo-Aryan language with more than 75 million native speakers1 is a resourcepoor language in terms of the availability of language technology tools and resources. In this thesis, we discuss the approaches taken to develop resources and tools for a resourcepoor language with special focus on Sindhi. The major contributions of this work include raw and annotated datasets, a POS Tagger,...

متن کامل

Fast Morphological Analysis of Czech

This paper presents a new Czech morphological analyser which takes an advantage of Jan Daciuk’s algorithms for minimal deterministic acyclic finite state automata. The new analyser is six times faster than the current analyser ajka concerning the proper analysis, i.e. returning possible lemmata and tags for a given word form, but for some other related tasks is the difference even bigger.

متن کامل

A Two-Level Morphological Analyser for the Indonesian Language

This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...

متن کامل

A Morphological Analyser for Machine Translation Based on Finite-state Transducers

A finite-state, rule-based morphological analyser is presented here, within the framework of machine translation system TAVAL. This morphological analyser introduces specific features which are particularly useful for translation, such as the detection and morphological tagging of word groups that act as a single lexical unit for translation purposes. The case where words in one such group are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016